fix(core): a chunk timeout when processing llm stream#16366
Open
fix(core): a chunk timeout when processing llm stream#16366
Conversation
Contributor
Author
|
Nevermind, just forcing over this PR |
a306a33 to
5df0c6e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It's possible for an SSE stream to stall. We handle some of the cases, but when reading the SSE stream from the provider, we are currently prone to it stalling in the case where the OS freezes network requests (due to going to sleep or other reasons), and the provider closes the request. When the OS unfreezes the network request, the SSE stream will hang because it never receives any notification that the other end has closed it.
(The above may not be 100% correct, I am not super familiar with the specifics of TCP requests when it comes to SSE. It seems to sometimes be able to become aware that the other end has closed it, but I am to reliably reproduce a stalled SSE connection by forcing the OS to freeze network requests, waiting a while, and unfreezing them)
We need to timeout if we don't receive a chunk from the SSE stream after a period of time. It's currently set to 2 minutes which is quite high; this should handle the case of a stalled connection with no impact on normal slow SSE streams. Note that this does NOT timeout the initial fetch to get the SSE stream; that may take a while, but once streaming starts, 2 minutes should be a very high upper bound (I scanned some other projects, some people set it to 30 seconds)
AI SDK has this feature: https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text#timeout. However they added that in v6 and we are on v5; still, it's a pattern for us to handle timeouts ourselves so we can control it.
I originally added this in the processor loop, but I quickly realized we can't do it there. We need to do it much more low-level. The processor loop may wait a while if tools are running, especially for things like subagents. We need to do it directly for every call to
/messages, which will work because that is a direct start, stream, and finish turn. This PR does that.It adds a new
chunkTimeoutoption in the providers config, so users can customize this.Testing
I manually test this by running complex conversations that invoked long-running tools and subagents. At this low-level, it works to apply a timeout here as each fetch to the provider endpoint starts a stream that will end when the chunks are fully streamed. I added extensive logging and everything looked correct. I also test overriding this config per model.